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FUSION PROTEINS, THEIR PREPARATION AND USE 

This is a continuation-in-part of co-pending U.S. 
Patent Application Serial No, 07/399 , 874 filed August 
29, 1989. 

FIELD OF THE INVENTION 

The present invention relates to fusion proteins 
and a process for preparing fusion proteins- The 
invention also pertains to various oligonucleotide and 
amino acid sequences which make up fusion proteins of 
the present invention. 

BACKGROUND OF TH E INVENTION 

Proteins, which in addition to the desired 
protein , also have an undesirable constituent or 
"ballast" constituent in the end product are referred 
to as fusion proteins. When proteins are prepared by 
genetic engineering, the intermediate stage of a 
fusion protein is utilized particularly if, in direct 
expression, the desired protein is decomposed 
relatively rapidly by host-endogenous proteases, 
causing reduced or entirely inadequate yields of the 
desired protein. 

The magnitude of the ballast constituent of the 
fusion protein is usually selected in such a manner 
that an insoluble fusion protein is obtained. This 
insolubility not only provides the desired protection 
against the host-endogenous proteases but also permits 
easy separation from the soluble cell components. It 
is usually accepted that the proportion of the desired 
protein in the fusion protein is relatively small-, 



i.e. that the cell produces a relatively large 
quantity of "ballast". 

The preparation of fusion proteins with a short 
ballast constituent has been attempted. For example , 
a gene fusion was prepared which codes for a fusion 
protein from the first ten amino acids of B- 
galactosidase and somatostatin. However, it was 
observed that this short amino acid chain did not 

-adequately — protect the — -fusion— protein — against 

decomposition by the host-endogenous proteases (US-A 
4 366 246, Column 15, Paragraph 2). 

Prom EP-A 0 290 005 and 0 292 763, we know of 
fusion proteins, the ballast constituent of which 
consists of a B-galactosidase fragment with more than 
250 amino acids. These fusion proteins are insoluble, 
but they can easily be rendered soluble with urea 
(EP-A 0 290 005) . 

Although fusion proteins have been described in 
the art, the generation of fusion proteins with 
desirable traits such as protease resistance is a 
laborious procedure and often results in fusion 
proteins that have a number of undesirable 
characteristics. Thus, a need exists for an efficient 
process for producing fusion proteins with a number of 
attractive traits including protease resistance, 
proper folding, and effective cleavage of the ballast 
from the desired protein. 

SUMMARY OF THE INVENTION 

The present invention relates to a process for 
the preparation of fusion proteins. Fusion proteins 
of the present invention contain a desired protein and 
a ballast constituent. The process of the present 
invention involves generating an oligonucleotide 



library (mixture) coding for ballast constituents, 
inserting the mixed oligonucleotide (library) into a 
vector so that the oligonucleotide is functionally 
linked to a regulatory region and to the structural 
gene coding for the said desired protein, and 
transforming host cells with the so-obtained vector 
population. Transf ormants are then selected which 
express a fusion protein in high yield. 

The process of the present invention further 
includes oligonucleotide coding for an amino acid or 
for a group of amino acids which allows an easy 
cleavage of the desired protein from the said ballast 
constituent. The cleavage may be enzymatic or 
chemical. 

The invention also pertains to an oligonucleotide 
designed so that it leads to an insoluble fusion 
protein which can easily be solubilized. Fusion 
proteins of the present invention thus fulfill the 
requirements established for protease resistance. 

Furthermore, oligonucleotide of the present 
invention may be designed so that the ballast 
constituent does not interfere with folding of the 
desired protein. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 and its continuation in Figure la and 
Figure lb show the construction of plasmid population 
(gene bank) pINT4x from the known plasmid pH154/25* 
via plasmid pINT40. Other constructions have not been 
graphically presented because they are readily 
apparent from the figures. 

Figure 2 is a map of plasmid pUHIO containing the 
complete HMG CoA reductase gene. 
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Figures 3 and 3a show construction of pIK4, a 
plasmid containing the mini-proinsulin gene. 

DETAILED DESCRIPTION OF THE INVENTION 

The invention relates to a process for the 
5 preparation of a fusion protein characterized in that 

a mixed oligonucleotide is constructed which codes for 
the ballast constituent of the fusion protein. The 

— oligonucleotide mixture- is- -introduced—in -a -vector- in 

such a manner that it is functionally linked to a 
10 regulatory region and to the structural gene for the 

desired protein. Appropriate host cells are 

transformed with the plasmid population obtained in 
this manner, and the clones producing a high yield of 
coded fusion protein are selected. Advantageous 
15 embodiments of this invention are explained below: 

The oligonucleotide advantageously codes at the 
3 '-end an amino acid or a group of amino acids which 
permits or permit easy and preferably enzymatic 
cleavage of the ballast constituent from the desired 

2 0 protein. According to another implementation form, an 

oligonucleotide is constructed that yields an 
insoluble fusion protein which can easily be made 
soluble. In particular, an oligonucleotide is 
preferably constructed which codes for a ballast 
25 constituent that does not disturb the folding of the 

desired protein. 

For practical reasons, the construction, 
according to the invention, of the oligonucleotide for 
the ballast constituent causes the latter to be very 

3 0 short. 

It was surprising to observe that, even when they 
have an extremely short ballast constituent, fusion 
proteins not only fulfill the requirements established 



for protease resistance, but are also produced at a 
high expression rate and, if desired, the fusion 
protein is insoluble, can easily be rendered soluble. 
In the dissolved or soluble state, the short ballast 
constituent according to the invention then permits a 
sterically favorable conformation of the desired 
protein so that it can be properly folded and easily 
separated from the ballast constituent. 

If the desired protein is formed in a pro-form, 
the ballast constituent can be constituted in such a 
manner that its cleavage can occur concomitantly with 
the transformation of the pro-protein into the mature 
protein. In insulin preparation, for example, the 
ballast constituent and the C chain can be removed 
simultaneously, yielding a derivative of the mature 
insulin which can be transformed into insulin without 
any side reactions involving much loss. 

The short ballast constituent according to the 
invention is actually shorter than the usual signal 
sequences of proteins and does not disturb the folding 
of the desired protein. It therefore need not be 
eliminated prior to the final processing step yielding 
the mature protein. 

The oligonucleotide coding for the ballast 
constituent preferably contains the DNA sequence 
(coding strand) 

(DCD) X 

in which D stands for A, G or T and x is 4-12, 
preferably 4-8. 

In particular, the oligonucleotide is 
characterized by the DNA sequence (coding strand) 

ATG (DCD) y (NNN) 2 
in which N in the NNN triplet stands for identical or 
different nucleotides, excluding stop codons, z is 1-4 
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and y+z is 6-12, preferably 6-10, wherein y is at 
least: 4. It has proved advantageous for the 
oligonucleotide to have the DNA sequence (coding 
strand) 

5 ATG (DCD) 5 _ 8 (NNN) 

especially if it has the DNA sequence (coding strand) 

ATG GCW (DCD) 4 _ 8 CGW 

or , advantageous ly 
ATG — GCA (DCD)^— CGW 

10 in which W stands for A or T. 

The above-mentioned DNA model sequences fulfill 
all of these requirements. Codon DCD codes for amino 
acids serine, threonine and alanine and therefore for 
a relatively hydrophilic protein chain. Stop codons 

15 are excluded and selection of the amino acids remains 

within manageable scope. The following is a 

particularly preferable embodiment of the DNA sequence 
for the ballast constituent, especially if the desired 
protein is pro insulin: 

20 ATG GCW (DCD) y , ACG CGW 

or 

ATG GCD (DCD) y , ACG CGT 
in which y' signifies 3 to 6, especially 4 to 6. 

The second codon, GCD, codes for alanine and 
25 completes the recognition sequence for the restriction 

enzyme Ncol, provided that the anterior regulation 
sequence ends with CC. The next to last triplet codes 
for threonine and, together with the codon CGT for 
arginine, represents the recognition sequence for 
30 restriction enzyme Mlul. Consequently, this 

oligonucleotide can be easily and unambiguously 
incorporated in gene constructions. 

The (NNN) z group codes in the 3' position for an 
amino acid or a group of amino acids that permits 



IUIU. 1IIU UIUUUOUIU I I 



I 



simple, and preferably enzymatic, separation of the 
ballast constituent from the subsequent protein 
desired. It is expedient to select the nucleotides in 
this group in such a manner that at the 3 '-end they 
code the cleavage site of a restriction enzyme which 
permits linkage of the structural gene for the desired 
protein. It is also advantageous for the ATG start 
codon and if necessary the first DCD triplet to be 
incorporated into the recognition sequence of a 
restriction enzyme so that the gene for the ballast 
constituent according to the invention can easily be 
inserted in the usual vectors. 

The upper limit of z is obtained on the one hand 
from the desired cleavage site for (enzymatic) 
cleavage of the fusion protein obtained, i.e. it 
encompasses codons, for example , for the amino acid 
sequence Ile-Glu-Gly-Arg, in case cleavage is to be 
carried out with factor Xa. In general, the upper 
limit for the sum of y and z is 12, since the ballast 
constituent should of course be as small as possible 
and, above all, not interfere with the folding of the 
desired protein. 

For reasons of expediency, bacteria or low 
eukaryotic cells such as yeasts are preferred as the 
host organism in genetic engineering processes, 
provided that higher organisms are not required. In 
these processes, the expression of the heterologous 
gene is regulated by a homologous regulatory region, 
i.e. one that is intrinsic to the host or compatible 
with the host cell. If a pre-peptide is expressed, it 
often occurs that the pre-sequence is also 
heterologous to the host cell. In practice, this 
lacking "sequence harmony" frequently results in 
variable and unpredictable protein yields. Since the 



ballast sequence according to the invention is adapted 
to its environment f the selection process according to 
the invention yields a DNA construction characterized 
by this "sequence harmony". 

The beginning and end of the ballast constituent 
are set in this construction: Methionine is at the 
beginning, and an amino acid or a group of amino acids 
that permit the desired separation of the ballast 
const ituent—from- -the— desired- protein ^is~~ at~the"eirdr 
If, for example, the desired protein is proinsulin, as 
NNN a triplet coding for arginine is advantageously 
selected as the last codon as this permits the 
particularly favorable simultaneous cleaving off of 
the ballast constituent with the removal of the C 
chain. Of course, the end of the ballast constituent 
can also be an amino acid or a group of amino acids 
which allows a chemical cleavage, e.g. methionine, so 
that cleavage is possible with cyanogen bromide or 
chloride. 

The intermediate amino acid sequence should be as 
short as possible so that folding of the desired 
protein is not affected. Moreover, this chain should 
be relatively hydrophilic so that solubilization is 
facilitated with undissolved fusion proteins and the 
fusion protein remains soluble. Cysteine residues are 
undesirable since they can interfere with the 
formation of the disulfide bridges. 

The DNA coding for the ballast constituent is 
synthesized in the form of a mixed oligonucleotide; it 
is incorporated in a suitable expression plasmid 
immediately in front of the structural gene for the 
desired protein and E. coli is transformed with the 
gene bank obtained in this manner. Appropriate gene 
structures can be obtained in this way by the 
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selection of bacterial clones that produce 
corresponding fusion proteins. 

It was previously mentioned that the cleavage 
sites for the restriction enzymes at the beginning and 
5 end of the nucleotide sequence coding for the ballast 

constituent are to be regarded as examples only. 
Recognition sequences that encompass starting codon 
ATG and in which any nucleotides that follow may 
include the codon for suitable amino acids are, by way 

io of example, also those for restriction enzymes Afllll, 

Ndel, Nlalll, NspHI or Styl. Since in the preferred 
embodiment arginine is to be at the end of the ballast 
sequence and since there are six different codons for 
arginine, additional appropriate restriction enzymes 

15 can also be found here for use instead of Mlul, i.e., 

Nrui, Avrll, AflHI, clal or Haell. 

However, it is also advantageous to use a 
"polymerase chain reaction" (PCR) according to Saiki, 
R.K. et al. , Science 239:487-491, 1988, which can 

20 dispense with the construction of specific recognition 

sites for restriction enzymes. 

It was previously indicated that limitation to 
the DNA sequence (DCD)x is for reasons of expediency 
and that this does not rule out other codons such as, 

25 for example, those for glycine, proline, lysine, 

methionine or asparagine. 

The most efficient embodiment of this DNA 
sequence is obtained by selection of good producers of 
the fusion protein, i.e., the fusion protein 

30 containing proinsulin. This yields the most favorable 

combination of regulation sequence, ballast sequence 
and desired protein, as a result of which unfavorable 
combinations of promoter, ballast sequence and 
structural gene are avoided and good results are 
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obtained with, minimum expenditure in terms of the 
above-mentioned "sequence harmony". 

Surprisingly , it was observed that the genes 
optimized for the ballast constituent according to the 
invention do not always contain the triplets preferred 
by E. coli . It was found that for Thr, codon ACA, 
which is used least frequently by E. coli . actually 
occurs frequently in the selected sequences. If , for 
example, — the — following — amino — acid — sequence— were- 



10 optimized according to the preferred codon usage 

(p.c.u.) bv E. coli (p.c.u.: Aota, S. et al., Nucleic 
Acids R esearch 16 (supplement): r315, r316, r391, r402 
(1988) ), we would obtain a totally different gene 
structure than that obtained according to the 

15 invention (Cf. Table 1): 

Ala Thr Thr Ser Thr Ala Thr Thr 

GCG ACC ACC AGC ACC GCG ACC ACC p.c.u. 

GCA ACA ACA TCA ACA GCA ACT ACG invention 

In the case of the fusion proteins with a 
20 proinsulin constituent, the initial starting point was 

a ballast constituent with 10 amino acids. The DNA 
sequence of the best producer then served as the base 
for variations in this sequence , whereupon it was 
noted that up to 3 amino acids can be eliminated 
25 without a noticeable loss in the relative expression 

rate. This finding is not only surprising, since it 
was unexpected that such a short ballast protein would 
be adequate, but also very advantageous since of 
course the relative proportion of proinsulin in the 
fusion protein increases as the ballast constituent 
decreases. 

The significance of the ballast constituent in 
the protein is apparent from the following comparison: 
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Human proinsulin contains 8 6 amino acids. If, for a 
fusion protein according to EP-A 0 290 005, we take 
the lower limit of 250 amino acids for the ballast 
constituent , the fusion protein has 33 6 amino acids , 
5 only about one quarter of which occur in the desired 

protein. By comparison, a fusion protein according 
to the invention with only 7 amino acids in the 
ballast constituent has 93 amino acids, the proinsulin 
constituent amounts to 92.5%. If the desired protein 

10 has many more amino acids than the proinsulin, the 

relationship between ballast and desired protein 
becomes even more favorable. 

It has been mentioned on a number of occasions 
that as a desired protein proinsulin represents only 

15 one preferred embodiment of the invention. However, 

the invention also works with much larger fusion 
proteins for which a fusion protein with the active 
domain of human 3-hydroxy-3-methylglutaryl-coenzyme 
A-reductase (HMG) is mentioned as an example. This 

20 protein contains 461 amino acids. A gene coding for 

the latter is known e.g. from EP-A 292 803. 

Having now generally described the invention, the 
same will be more readily understood through reference 
to the following examples which are provided by way of 

25 illustration, and are not intended to be limiting of 

the present invention, unless specified. 

Example 1 

Construction of the gene bank and selection of a clone 
with high expression 
30 If not otherwise indicated, all media are 

prepared according to Maniatis, T.; Fritsch, E. F. and 
Sambrook, J.: Molecular Cloning, Cold Spring Harbor 
Laboratory (1982) . TP medium consists of M9CA medium 
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but with a glucose and casamino acid content of 0.4% 
each. If not otherwise indicated, all media contain 
50 Mg/iftl ampicillin. Bacterial growth during 

fermentation is determined by measurement of the 
optical density of the cultures at 600 nm (OD) . 
Percentage data refer to weight if no other data is 
reported. 

The starting material is plasmid pH154/25* 
( f igur e — l ) —which—is- known~f rom EP-A 0~2 1129 9~ herein 
incorporated by reference. This plasmid contains a 
fusion protein gene (D' -Pro in) linked to a 
trp-promoter and a resistance gene for resistance 
against the antibiotic ampicillin (Amp) . The fusion 
protein gene codes a fusion protein that contains a 
fragment of the trpD-protein from E. coli (D') and 
monkey proinsulin (Pro in) . The gene structure of the 
plasmid results in a polycistronic mRNA, which codes 
for both the fusion protein and the resistance gene 
product. To suppress the formation of excess 
resistance gene product, initially the (commercial) 
trp-transcription terminator sequence (trpTer) (2) is 
introduced between the two structural genes . To do 
so, the plasmid is opened with EcoRl and the 
protruding ends are filled in with Klenow polymerase. 
The resulting DNA fragment with blunt ends is linked 
with the terminator sequence (2) 

5'AGCCCGCCTAATGAGCGGGCTTTTTTTT3 ' 

3 ' TCGGGCGGATTACTCGCCCGAAAAAAAA5 9 ( 2 ) 

which results in plasmid pINT12 (figure l-(3)). 

The starting plasmid pH154/25* contains a 
cleavage site for enzyme Pvul in the Amp gene, as well 
as a Hindlll-cleavage site in the carboxyterminal area 
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of the trpD-f ragment . Both cleavage sites are 
therefore also contained in pINT12. By cutting the 
plasmid (Figure l-(3)) with Pvul and Hindlll, it is 
split into two fragments from which the one containing 
5 the proinsulin gene (figure l-(4)) is isolated. 

Plasmid pGATTP (figure la-(5)), which is structured in 
an analogous manner to (3) but which instead of the 
D' -Pro in gene carries a gamma-interf eron gene (Ifn) 
containing restriction cleavage sites Ncol and 

10 Hindlll, is also cut with Pvul and Hindlll and the 

fragment (figure la- (6)) with the promoter region is 
isolated. By ligation of this fragment (6) with the 
fragment (4) obtained from (3), we acquire plasmid 
PINT40 (figure la-(7)). The small fragment with the 

15 remainder of the gamma-interf eron gene is cut from the 

latter with Ncol and Mlul. The large fragment (figure 
lb- (8)) is ligated with mixed olignonucleotide (9) 

5 ' CATGGCDDCDDCDDCDDCDDCDDCDA3 ' 
20 3 ' CGHHGHHGHHGHHGHHGHHGHTGCGC5 ' ( 9 ) 

in which D stands for A f G or T and H signifies the 
complementary nucleotide. This results in plasmid 
population (gene bank) pINT4x (figure lb-(10)). Mixed 
oligonucleotides of the present invention may be 
25 obtained by techniques well known to those of skill in 

the art. 

The mixed oligonucleotide (9) is obtained from 
the synthetic mixed oligonucleotide (9a) 

TTCGGGTACCGHHGHHGHHGHHGHHGHHGHTGCGCAG5 ' 
3 0 TTGCCCATGGC3 9 (9a) 



X5ID: <WO 91035S0A1J_> 



WO 91/03550 PCT/US90/04840 



-14- 



which is filled in with Klenow polymerase and cut with 
Mlul and Nco. 

The strain E. coli WS3110 is transformed with the 
plasmid population (10) and the bacteria are plated on 
LB agar dishes. Six of the resulting bacterial clones 
are tested for their ability to produce a fusion 
protein with an insulin constituent. For this 
purpose , overnight cultures of the clones are prepared 
-in- LB -medium— and — 100 /xl ^-aMguots of-the- cultures- are- 



10 mixed with 10.5 ml TP medium and shaken at 37°C. At 

OD600 = 1 the cultures are adjusted to 20 /ig/ml 
3-B-indolylacrylic acid (IAA) , a solution of 40 mg 
glucose in 100 ml water is added and the preparation 
is shaken for another three hours at 37°C. 

15 Subsequently 6 OD equivalents of the culture are 

removed, the bacteria contained therein are harvested 
by centrifugation and resuspended in 3 00 /il test 
buffer (37.5 mM tris of pH 8.5, 7 M urea, 1% (w/v) SDS 
and 4% (v/v) 2-mercaptoethanol) . The suspension is 

20 heated for five minutes, treated for two seconds with 

ultrasound to reduce viscosity and aliquots thereof 
are subsequently subjected to SDS-gel electrophoresis. 
With bacteria that produce fusion protein, we can 
expect a protein band with a molecular weight of 

25 10,350 D. It is evident that one of the clones, 

PINT41 (Table 1) , produces an appropriate protein in 
relatively large quantities while no such protein 
formation is seen with the remaining clones. An 
immune blot experiment with insulin-specific 

3 0 antibodies confirms that the protein coded by pINT41 

contains an insulin constituent. 

Table 1 shows the DNA and amino acid sequence of 
the ballast constituent for a number of plasmid 
constructs. In particular, table 1 illustrates the 



r>Cin <Wn 9103550A1 I > 



WO 91/03550 PCT/US90/04840 



-15- 



DNA and amino acid sequence of the ballast constituent 
in the pINT4l fusion protein. 
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Table 1 

12345 6789 10 11 pINT 

Met Ala Thr Thr Ser Thr Ala Thr Thr Arg 

ATG GCA ACA ACA TCA ACA GCA ACT ACG CGT 41 

5 Thr Ser Thr 

* * **g—A*T-T*G—A*G-* *G~* **™ 42 

Ala Thr Ser Thr Ser 

*** **t G** *** A*T T*T A*T T*A *** *** 43 

Asn Ser 

10 *** *** *** *** *** AAC T*A *** *** 60 

*** *** *** *** *** *** 67d 

*** *** *** *** *** *** *** 68(i 

15 

*** *** *** *** *** *** **A 69d,72d 

Gly Asn Ser Ala 

*** *** *** *** *** *** * G * t** GCA ** A 90(1,91(1 

20 *** *** *** *** *** *** AA* ** A 93ci 

Pro 

*** *** *** *** *** *** c ** **A 94d 

Met 

*** *** *** *** *** *** ATG — **A 95d 
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Gly 

*** *** *** *** *** *** *g* **A 96d 



XI D: <wn S103S50A1 I > 



Example 2 

Selection of additional clones 

To detect additional suitable clones, a method 
according to Helfman, D.M. et al. (Proc. Natl. Acad. 
Sci. USA 80:31-35, 1983) is used. TP-agar dishes, the 
medium of which contains an additional 40 /xm/ml IAA, 
are utilized for this purpose. Fifteen minutes before 
use 7 - the -agar surf ace^of- the-plates"is —coated ~with ^ 
2-mm thick TP top agar layer, a nitrocellulose filter 
is placed on the latter and freshly transformed cells 
are placed on the filter. Copies are made of the 
filters which have grown bacteria colonies following 
incubation at 37°C, and the bacteria from the original 
filter are lysed. To accomplish this, the filters are 
exposed to a chloroform atmosphere in an desiccator 
for 15 minutes, subsequently moved slowly for six 
hours at room temperature in immune buffer (50 mM tris 
of pH 7.5, 150 mM NaCl, 5 mM MgCl 2 , and 3% (w/v) BSA) , 
which contains an additional 1 /zg/ml DNase I and 40 
/ig/ml lysozyme, and then washed twice for five minutes 
in washing buffer (50 mM tris of pH 7.5 and 150 mM 
NaCl) . The filters are then incubated overnight at 3°c 
in immune buffer with insulin-specific antibodies, 
washed four times for five minutes with washing 
buffer, incubated for one hour in immune buffer with 
a protein A-horseradish peroxidase conjugate, washed 
again four times for five minutes with washing buffer 
and colonies that have bound antibodies are visualized 
with a color reaction. Clones pINT42 and pINT43, 
which also produce fairly large quantities of fusion 
protein, are found in this manner in 500 colonies. 
The DNA obtained by sequencing and the amino acid 
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sequences derived from it have also been reproduced in 
Table 1. 

Example 3 

Preparation of plasmid pINT41d. 
5 Between the replication origin and the 

trp-promoter, plasmid pINT41 contains a nonessential 
DNA region which is flanked by cleavage sites for 
enzyme Nsp (7524)1* To remove this region from the 
plasmid, pINT41 is cut with NSP(7524)1, and the larger 
10 of the resulting fragments is isolated and religated. 

This gives rise to plasmid pINT41d r the DNA sequence 
of which is reproduced in Table 2. 
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Table 2: DNA-Sequence of Plasmid pINT41d 

1° 30 50 

GTGTCATGGTCGGTGATCGCCAGGGTGCCGACGCGCATCTCGACTTGCACGGTGCACCAA 
70 90 HO 

TGCTTCTGGCGTCAGGCAGCCATCGGAAGCTGTGGTATGGCTGTGCAGGTCGTAAATCAC 
130 150 17 0 

TGCATAATTCGTGTCGCTCAAGGCGCACTCCCGTTCTGGATAATGTTTTTTGCGCCGACA 
^O 210 230 

TCATAACGGTTCTGGCAAATATTCTGAAATGAGCTGTTGACAATTAATCATCGAACTAGT 
250 270 290 

TAACTAGTACGCAAGTTCAGGTAAAAAGGGTATeGAeeATGGCAACAACATCAACAGCAA 
310 330 350 

CTACGCGTTTCGTGAACCAGCACCTGTGCGGCTCCCACCTAGTGGAAGCTCTCTACCTGG 
370 390 410 

TGTGCGGGGAGCGAGGCTTCTTCTAGACACCCAAGACCCGCCGGGAGGCAGAGGACCCTC 
430 450 470 

AGGTGGGGCAGGTGGAGCTGGGCGGGGGCCCTGGCGCAGGCAGCCTGCAGCCCTTGGCGC 
490 510 5 3 0 

TGGAGGGGTCCCTGCAGAAGCGCGGCATCGTGGAGCAGTGCTGCACCAGCATCTGCTCCC 
550 570 590 

TCTACCAGCTGGAGAACTACTGCAACTAATAGTCGACCTTTGCTTTCATTGTCGATGATA 
610 630 650 

AGCTGTCAAACATGAGAATTAGCCCGCCTAATGAGCGGGCTTTTTTTTAATTCTTGAAGA 
670 690 710 

CGAAAGGGCCTCGTGATACGCCTATTTTTATAGGTTAATGTCATGATAATAATGGTTTCT 
730 750 770 

TAGACGTCAGGTGGCACTTTTCGGGGAAATGTGCGCGGAACCCCTATTTGTTTATTTTTC 
790 810 830 

TAAATACATTCAAATATGTATCCGCTCATGAGACAATAACCCTGATAAATGCTTCAATAA 
850 870 890 

TATTGAAAAAGGAAGAGTATGAGTATTCAACATTTCCGTGTCGCCCTTATTCCCTTTTTT 
910 930 950 

GCGGCATTTTGCCTTCCTGTTTTTGCTCACCCAGAAACGCTGGTGAAAGTAAAAGATGCT 
970 990 1010 

GAAGATCAGTTGGGTGCACGAGTGGGTTACATCGAACTGGATCTCAACAGCGGTAAGATC 
1030 1050 1070 

CTTGAGAGTTTTCGCCCCGAAGAACGTTTTCCAATGATGAGCACTTTTAAAGTTCTGCTA 
109 0 1110 1130 

TGTGGCGCGGTATTATCCCGTGTTGACGCCGGGCAAGAGCAACTCGGTCGCCGCATACAC 
H50 1170 H90 

TATTCTCAGAATGACTTGGTTGAGTACTCACCAGTCACAGAAAAGCATCTTACGGATGGC 
1210 1230 1250 

ATGACAGTAAGAGAATTATGCAGTGCTGCCATAACCATGAGTGATAACACTGCGGCCAAC 
1270 1290 1310 

TTACTTCTGACAACGATCGGAGGACCGAAGGAGCTAACCGCTTTTTTGCACAACATGGGG 
1330 1350 1370 

GATCATGTAACTCGCCTTGATCGTTGGGAACCGGAGCTGAATGAAGCCATACCAAACGAC 
1390 1410 1430 

GAGCGTGACACCACGATGCCTGCAGCAATGGCAACAACGTTGCGCAAACTATTAACTGGC 
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1450 1470 1490 

GAACTACTTACTCTAGCTTCCCGGCAACAATTAATAGACTGGATGGAGGCGGATAAAGTT 

1510 1530 1350 

GCAGGACCACTTCTGCGCTCGGCCCTTCCGGCTGGCTGGTTTATTGCTGATAAATCTGGA 

1570 1590 1610 

GCCGGTGAGCGTGGGTCTCGCGGTATCATTGCAGCACTGGGGCCAGATGGTAAGCCCTCC 

1630 1650 1670 

CGTATCGTAGTTATCTACACGACGGGGAGTCAGGCAACTATGGATGAACGAAATAGACAG 

1690 1710 1730 

ATCGCTGAGATAGGTGCCTCACTGATTAAGCATTGGTAACTGTCAGACCAAGTTTACTCA 

1750 1770 1790 

TATATACTTTAGATTGATTTAAAACTTCATTTTTAATTTAAAAGGATCTAGGTGAAGATC 

1810 1830 1850 

CTTTTTGATAATCTCATGACCAAAATCCCTTAACGTGAGTTTTCGTTCCACTGAGCGTCA 

1870 1890 1910 

1930 1950 1970 

TGCTTGCAAACAAAAAAACCACCGCTACCAGCGGTGGTTTGTTTGCCGGATCAAGAGCTA 

1990 2010 2030 

CCAACTCTTTTTCCGAAGGTAACTGGCTTCAGCAGAGCGCAGATACCAAATACTGTCCTT 

2050 2070 2090 

CTAGTGTAGCCGTAGTTAGGCCACCACTTCAAGAACTCTGTAGCACCGCCTACATACCTC 

2110 2130 2150 

GCTCTGCTAATCCTGTTACCAGTGGCTGCTGCCAGTGGCGATAAGTCGTGTCTTACCGGG 

2170 2190 2210 

TTGGACTCAAGACGATAGTTACCGGTAAGGCGCAGCGGTCGGGCTGAACGGGGGGTTCGT 

2230 2250 2270 

GCACACAGCCCAGCTTGGAGCGAACGACCTACACCGAACTGAGATACCTACAGCGTGAGC 

2290 2310 2330 

ATTGAGAAAGCGCCACGCTTCCCGAAGGGAGAAAGGCGGACAGGTATCCGGTAAGCGGCA 

2350 2370 2290 

GGGTCGGAACAGGAGAGCGCACGAGGGAGCTTCCAGGGGGAAACGCCTGGTATCTTTATA 

2410 2430 2450 

GTCCTGTCGGGTTTCGCCACCTCTGACTTGAGCGTCGATTTTTGTGATGCTCGTCAGGGG 

2470 2490 2510 

GGCGGAGCCTATGGAAAAACGCCAGCAACGCGGCCTTTTTACGGTTCCTGGCCTTTTGCT 

2530 2550 2570 

GGCCTTTTGCTCACATGTGTCAGAGGTTTTCACCGTCATCACCGAAACGCGCGAGGCAGC 



TG 
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Example 4 

Fermentation and processing of pINT4 ld-fusion protein 

(i) Fermentation: A shaking culture in 13 medium 
is prepared from E. coli W3110 transformed with 
pINT41d. Fifteen /xl of this culture, which has an OD 
= 2 are then put into 15,7 1 TP medium and the 
suspension is fermented 16 hours at 37°C. The culture, 
^hi^h^-t-4^-i^ 



20 Mg/ml IAA, and until the end of fermentation, after 
10 another five hours, a 50% (w/v) maltose solution is 

continuously pumped in at a rate of 100 ml/hour. An 
OD — 17.5 is attained in this process. At the end, 
the bacteria are harvested by centrifugation. 

15 (ii) Rupture of Cells: The cells are 

resuspended in 400 ml/disintegration buffer (10 mM 
tris of pH 8.0, 5 mM EDTA) and disrupted in a French 
press. The fusion protein containing insulin is 
subsequently concentrated by 3 0 minutes of 

20 centrifugation at 23,500 g and washed with 

disintegration buffer. This yields 134 g sediment 
(moist substance) . 

(iii) Sulfitolysis: 12.5 g sediment (moist 
substance) from (ii) are stirred into 125 ml of an 8 

25 m urea solution at 35°C. After stirring for thirty 

minutes, the solution is adjusted to pH 9.5 with 
sodium hydroxide solution and reacted with 1 g sodium 
sulfite. After an additional thirty minutes of 
stirring at 3 5°C, 0.25 g sodium tetrathionate is added 

3 0 and the mixture is again stirred for thirty minutes at 

35°C. 
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(iv) DEAE-Anion exchange chromatography: The 
entire batch of (iii) is diluted with 250 ml buffer A 
(50 mM glycine , pH 9.0) and placed on a chromatography 
column which contains Fractogel (R> TSK DEAE-650 (column 
5 volume 130 ml, column diameter 26 mm) equilibrated 

with buffer A. After washing with buffer A, the 
fusion protein-S-sulfonate is eluted with a salt 
gradient consisting of 250 ml each buffer A and buffer 
B (50 mM glycine of pH 9.0, 3 M urea and 1 M NaCl) at 
10 a flow rate of 3 ml/minute. The fractions containing 

fusion protein-S-sulf onate are then combined. 

(v) Folding and enzymatic cleavage: The 
combined fractions from (iv) are diluted at 4°C in a 
15 volume ratio of 1 + 9 with folding buffer (50 mM 

glycine, pH 10.7) and per liter of the resulting 
dilution 410 mg ascorbic acid and 165 jil 
2-mercaptoethanol are added at 4°C under gentle 
stirring. After correction of the pH value to pH 
20 10.5 , stirring is continued for another 4 hours at 4°C 

Subsequently , solid N- ( 2 -hydroxyethy 1 ) -piperaz ine-N ' - 
2 -ethane sulfonic acid (HEPES) is added to an end 
concentration of 24 g per batch-liter. The mixture 
which now has pH 8 is digested with trypsin at 25°C 
25 During the process, the enzyme concentration in the 

digestion mixture is 80 pg/1. The cleavage course is 
followed analytically by RP-HPLC. After two hours/ 
digestion can be stopped by addition of 13 0 /xg soy 
bean trypsin inhibitor. HPLC shows the formation of 
19.8 mg di-Arg insulin from a mixture according to 
(iii) . The identity of the cleavage product is 
confirmed by protein sequencing and comparative HPLC 
with reference substances. The di-Arg insulin can be 
chromatographically purified according to known 
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methods and transformed to insulin with 
carboxypeptidase B. 

Example 5 

Construction of plasmid pINT60 

Plasmid pINT60 results in an insulin precursor, 
the ballast sequence of which consists of only nine 
amino acids. For construction of this plasmid , 
_plasmid„ . pINT-4 0 -_is_cut— with— Nco— and— Mlu-I — and— the 
resulting vector fragment is isolated. The 
oligonucleotide Insul5 

TTCGGGTACCGTTGTTGTAGTTTGAGTTGCGCAG 5 ' 
TTGCCCATGGC 3' 

r' 

is then synthesized, filled in with Klenow polymerase 
and also cut with these two enzymes. The resulting 
DNA fragment is then ligated with the vector fragment 
to yield plasmid pINT60. 

Table 1 shows the DNA and amino acid sequence of 
the ballast constituent in this fusion protein. 

Example 6 

Construction of plasmid pINT67d 

Plasmid pINT67d is a derivative of pINT41d in 
which the codon of the amino acid in position nine of 
the ballast sequence is deleted. That is why, like 
PINT60, it results in an insulin precursor with a 
ballast sequence of nine amino acids. A method 
according to Ho, S.N. et al. (Gene 77:51-59, 1989) is 
used for its construction. For this purpose, two 
separate PCR's are first performed with plasmid 
pINT41d and the two oligonucleotide pairs 
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TIR: 5'-CTG AAA TGA GCT GTT GAC-3' 

and 

DTR8: 5'-CAC AAA TCG AGT TGC TGT TGA TGT TGT- 3 ' 

or 

5 DTR9: 5'-ACA GCA ACT CGA TTT GTG AAC CAG CAC-3 ' 

and 

Insull: 5'-TCA TGT TTG ACA GCT TAT CAT-3 ' • 

This produces two fragments that are partially 
10 complementary to each other and when annealed with 

each other code a similar insulin precursor as pINT41d 
in which, however, the amino acid in position nine is 
absent. For completion , the two fragments are 

combined and subjected to another PCR together with 
15 the oligonucleotides TIR and Insull. From the DNA 

fragment obtained in this manner, the structural gene 
of the insulin precursor is liberated with Nco and 
Sail and purified. Plasmid pINT41d is then also cut 
with these two enzymes, the vector fragment is 

2 0 purified and subsequently ligated with the structural 

gene fragment from the PCR to yield plasmid pINT67d. 

The nucleotide and amino acid sequences for the 
ballast region have been reproduced in Table 1. 

25 Example 7 

Construction of plasmid pINT68d 

Like plasmid pINT67d, plasmid pINT68d is a 
shortened derivative of plasmid pINT41d in which the 
codons of the two amino acids in positions eight and 

3 0 nine of the ballast sequence are deleted. It 

therefore results in an insulin precursor with a 
ballast sequence of only eight amino acids. The 
procedure previously described in Example 6 is used 
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f or its construction but with the two olignonucleotide 
pairs 

TIR: 5'-CTG AAA TGA GCT GTT GAC-3' 

and 

5 DTR10: 5'-CAC AAA TCG TGC TGT TGA TGT TGT TGC-3 ' 

or 

DTR11: 5'-TCA ACA GCA CGA TTT GTG AAC GAG CAC-3' 

— _ : and 

Insull: 5'-TCA TGT TTG ACA GCT TAT CAT-3 ' . 

10 The nucleotide and amino acid sequences for the 

ballast region have been reproduced in Table 1. 

Example 8 

Construction of plasmid pINT69d 

Plasmid pINT69d is also a shortened derivative of 

15 plasmid pINT41d in which , however, the codons of the 

three amino acids in positions seven, eight and nine 
of the ballast sequence have been deleted. It 
therefore results in an insulin precursor with a 
ballast sequence of only seven amino acids. The 

20 procedure described in Example 6 is also used for its 

construction but with the two oligonucleotide pairs 

TIR: 5'-CTG AAA TGA GCT GTT GAC-3' 

and 

DTR12: 5'-CAC AAA TCG TGT TGA TGT TGT TGC CAT-3 ' 

25 or 

DTR13: 5 '-ACA TCA ACA CGA TTT GTG AAC CAG CAC-3' 

and 

Insull: 5 '-TCA TGT TTG ACA GCT TAT CAT-3'. 
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The nucleotide and amino acid sequences for the 
ballast region have been reproduced in Table 1. 

Example 9 

5 Construction of plasmid pINT72d 

Plasmid plNT72d is a derivative of plasmid 
pINT69d in which the entire C-peptide gene region, 
with the exception of the first codon for the amino 
acid arginine, is deleted. Consequently, this results 
10 in a "roiniproinsulin derivative" with an arginine 

residue instead of a C-chain. With plasmid pINT69d as 
a starting point, the procedure described in Example 
6 is also used for its construction but with the two 
oligonucleotide pairs 

15 TIR: 5'-CTG AAA TGA GCT GTT GAC-3 ' 

and 

Xnsu28: 5 '-GAT GCC GCG GGT CTT GGG TGT-3 ' 
or 

Insu27: 5'-AAG ACC CGC GGC ATC GTG GAG— 3 ' 
20 and 

Insull: 5'-TCA TGT TTG ACA GCT TAT CAT-3 ' . 

Example 10 

Construction of plasmids pINT73d, pINT88d and pINT89d 
25 Plasmid pINT73d is a derivative of plasmid 

pINT69d (Example 8) , in which the insulin precursor 
gene is arranged two times in succession. The plasmid 
therefore results in the formation of a polycistronic 
mRNA, which can double the yield. For its 

30 construction, a PCR reaction is carried out with 

plasmid pINT69d and the two oligonucleotides 

Insu29: 5'-CTA GTA CTC GAG TTC AC-3 ' 
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and 

Insull: 5'-TCA TGT TTG ACA GCT TAT CAT -3 ' . 

This gives rise to a fragment with the insulin 
precursor gene and the pertinent ribosome binding site 
which in its 5 '-end region has a cleavage site for 
enzyme Xhol and in its 3 '-end region a cleavage site 
for Sail. The fragment is cut with the two 
-above-mentioned enzymes and-~pur±~:fied^ — Plasmid plNT69 d~ 



10 is then linearized with Sail, the two DNA ends 

produced are dephosphorylated with phosphatase (from 
calf intestine) and ligated with the fragment from the 
PCR reaction to yield plasmid plNT73d. 

In an analogous manner there are obtained 

15 plasmids pINT88d and pINT89d when plasmid pINT72d 

(Example 9) is modified analogously by arranging the 
"minipro insulin gene" twice or thrice in sequence. 

Example 11 

Construction of plasmid pINI^ld 
20 The starting plasmid pRUD3 has a structure 

analogous to that of plasmid pGATTP . However, instead 
of the trp-promoter region , it contains a tac-promoter 
region which is flanked by cleavage sites for enzymes 
EcoRI and Nco. The plasmid is cut with EcoRI, 
25 whereupon the protruding ends of the cleavage site are 

filled in with Klenow polymerase. Cutting is 
performed subsequently with Nco and the ensuing 
promoter fragment is isolated. 

The trp-promoter of plasmid pINT41d is flanked by 
cleavage sites for enzymes PvuII and Nco. Since the 
plasmid has an additional cleavage site for PvuII , it 
is completely cut with Nco, but only partially with 
PvuII. The vector fragment, which is missing only the 
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promoter region , is then isolated from the ensuing 
fragments. This is then ligated with the tac-promoter 
fragment to yield plasmid pINL41d. 

5 Example 12 

Construction of plasmid pli41c 

Plasmid pPL-lambda (which can be obtained from 
Pharmacia) has a lambda-pL-promoter region. The 
latter is flanked by nucleotide 
10 sequences: 

5 ' GATCTCTCACCTACCAAACAAT3 ' 
and 

5 ' AGCTAACTGACAGGAGAATCC3 ' . 

Oligonucleotides 

15 LPL3 : 5' ATGAATTCGATCTCTCACCTACCAAACAAT 3' 

and 

LPL4: 5 ' TTGCCATGGGGATTCTCCTGTCAGTTAGCT 3' 

are prepared for additional flanking of the promoter 
region with cleavage sites for enzymes EcoRI and Nco. 

20 A PCR is carried out with these oligonucleotides and 

pPL- lambda and the resulting promoter fragment is cut 
with EcoRI and Nco and isolated* Plasmid pINL41d is 
then also cut with these two enzymes and the ensuing 
vector fragment, which has no promoter, is then 

25 ligated with the lambda-pL-promoter fragment to yield 

plasmid pL41c. 
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Example 13 

Construction of plasmid pL41d 

The trp-transcription terminator located between 
the resistance gene and the fusion protein gene in 
plasmid pL41c is not effective in coli strains that 
are suitable for fermentation (e.g. E ■ coli N4830-1) . 
For this reason , a polycistronic mRNA and with it a 
large quantity of resistance gene product are formed 
~~£n~f ermentation; To prev^1T~tRi^~sTde^ react rionTHEEeT 
trp-terminator sequence is replaced by an effective 
terminator sequence of the E . coli- rrnB-operon - 
Plasmid pANGMA has a structure similar to that of 
plasmid pINT41d, but it has an angiogenin gene instead 
of the fusion protein gene and an rrnB-terroinatqr 
sequence (from commercial plasmid pKK223-3, which can 
be obtained from Pharmacia) instead of the 
trp-terminator sequence. The plasmid is cut with Pvul 
and Sail and the fragment containing the 
rrnB-terminator is isolated. Plasmid pl/41c is then 
also cut with these two enzymes and the fragment 
containing the insulin gene is isolated. The two 
isolated fragments are then ligated to yield plasmid 
pL41d. 

Example 14 

Construction of plasmid pINTLI 

To prepare a plasmid for general use in the 
expression of fusion proteins , the proinsulin gene of 
plasmid pINT4ld is replaced by a poly linker sequence. 
This gene is flanked by cleavage sites for enzymes 
Mlul and Sail. The plasmid is therefore cut with the 
help of the two above-mentioned enzymes and the vector 
fragment is isolated. This is then ligated, to yield 
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plasmid pINTLI, with the following two synthetic 
oligonucleotides 

BstEII AccI EcoRI Kpnl BamHI 

5 ' CGCGCCTGGTTACCTCGAGGTATACTACGAATTCGAGCTCGGTACCCGGGGATCC 
3 ' GGACCAATGGAGCTCCATATGATGCTTAAGCTCGAGCCATGGGCCCCTAGG 
Xhol Sac I Xmal 

SphI Xbal 
CTGCAGGCATGCAAGCTTGTCTAGAC -3 ' 

GACGTCCGTACGTTCGAACAGATCTGAGCT-5 ' 
PstI Hindlll (Sail). 

Example 15 

Insertion of a gene coding for HMG CoA-reductase 
(active domain) in pINTLI and expression of the fusion 
protein 

Table 3 represents the DNA and amino acid 
sequence of the gene HMG CoA-reductase. The synthetic 
gene for HMG CoA-reductase known from EP-A O 292 803 
(herein incorporated by reference) contains a cleavage 
site for BstEII in the region of amino acids lieu and 
Val in positions 3 and 4 (see Table 3). A protruding 
sequence corresponding to enzyme Xbal occurs at the 
end of the gene (in the noncoding area) . The 
corresponding cleavage sites in the polylinker of 
plasmid pINTLI are in the same reading frame. Both 
cleavage sites are in each case singular. 

Plasmid pUHIO contains the complete HMG gene 
(HMG fragments I, XX, XXX, and IV), corresponding to 
the DNA sequence of table 3. Construction of pUHIO 
(figure 2) is described in EP-A 0 292 803 herein 
incorporated by reference. Briefly , special plasmids 
are prepared for the subcloning of the gene fragments 
HMG I to HMG IV and for the construction of the 
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complete gene. These plasmids are derived from the 
commercially available vectors pUClS , pUC19 and 
M13mpl8 or M13mpl9, with the polylinker region having 
been replaced by a new synthetic polylinker 
corresponding to DNA sequence VI 

Nco EcoRI Hindi II BamHI Xbal 

VI-1a 

5 ' AAT TGC CAT GGG CAT GCG GAA TTC CAA GCT TTG GAT CCA TCT AGA GGG 
?' CG GTA CCC GTA CGC CTT AAG GTT CGA AAC CTA GGT AGA TCT CCC TCG A 
f ^ VI _ lb , ; . , 



These new plasmids have the advantage that, in 
contrast to the pUC and M13mp plasmids, they allow the 
cloning of DNA fragments having the protruding 
sequences for the restriction enzyme Nco. Moreover, 

15 the recognition sequences for the cleavage sites Nco, 

EcoRI, Hindlll, BamHI, and Xbal are contained in the 
vectors in exactly the sequence in which they are 
present in the complete gene HMG, which facilitates 
the sequential cloning and the construction of this 

20 gene. Thus it is possible to subclone the gene 

fragments HMG I to HMG IV in the novel plasmids. 
After the gene fragments have been amplified, it is 
possible for the latter to be combined to give the 
complete gene (see below) . 

25 Preparation of vectors which contain DNA 

sequence VI 

DNA sequence- VI may be prepared by standard 
techniques. The commercially available plasmid pUC18 
(or pUC19, M13mpl8 or M13mpl9) is opened with the 
restriction enzymes EcoRI / Hindi 1 1 as stated by the 
manufacturer. The digestion mixture is fractionated 
by electrophoresis on a 1% agarose gel. The plasmid 
bands which have been visualized by ethidium bromide 



mm- <-wn Q-imssnAi i •> 



-33- 



staining are cut out and eluted from the agarose by 
electrophoresis. 20 fmol of the residual plasmid thus 
obtained are then ligated with 200 fmol of the DNA 
fragment corresponding to DNA sequence VI at room 
temperature overnight. A new cloning vector pSU18 (or 
PSU19, M13mUSl8 or M13mUS19) is obtained. In contrast 
to the commercially available starting plasmids, the 
new plamids can be cut with the restriction enzyme 
Nco. The restriction enzymes EcoRI and Hindlll 
likewise cut the plasmids only once because the 
polylinker which is inserted via the EcoRI and Hindlll 
cleavage sites destroys these cleavage sites which are 
originally present. 

h± Preparation of the hybrid plasmids which 

contain the ge ne fragments HMG I to HMG IV, 

i) Plasmid containing the gene fragment HMG I 

The plasmid pSUl8 is cut open with the 
restriction enzymes EcoRI and Nco in analogy to the 
description in Example 15 (a) above, and is ligated 
with the gene fragment I which has previously been 
phosphorylated . 

ii) Plasmid containing the gene fragment HMG 

II 

The plasmids with the gene subfragments HMG II- 
1, II-2 and II-3 are subjected to restriction enzyme 
digestion with EcoRI/MluI, MluI/BssHII or 
BssHII/Hindlll to isolate the gene fragments HMG II-l, 
HMG II-2 or HMG II-3 , respectively. The latter are 



then ligated in a known manner into the plasmid pSU18 
which has been opened with EcoRI/Hindlll. 

iii) Plasmid containing the gene fragment HMG 

III 

The plasmids with the gene subfragments HMG 
III-l and III-3 are digested with the restriction 
-enzymes — -EcoRiy-H-indlEE — and— then — cut — with— Sau96~I — to 
isolate the gene fragment HMG III-l, or with 
BamHI/Banll to isolate the gene fragment HMG III-3. 
These fragments can be inserted with the HMG III-2 
fragment into a pSU18 plasmid which has been opened 
with Hindlll/BamHI. 

iv) Plasmid containing the gene fragment HMG 

IV 

The plasmids with the gene subfragments HMG IV- 
(1+2) and IV- (3+4) are opened with the restriction 
enzymes EcoRI/BamHI and EcoRI /Xbal, respectively , and 
the gene fragments HMG IV- (1+2) and HMG IV- (3+4) are 
purified by electrophoresis. The resulting fragments 
are then ligated into a pSU18 plasmid which has been 
opened with BamHI/Xbal and in which the EcoRI cleavage 
site has previously been destroyed with SI nuclease as 
described below. A hybrid plasmid which still 
contains an additional AATT nucleotide sequence in the 
DNA sequence IV is obtained. The hybrid plasmid is 
opened at this point by digestion with the restriction 
enzyme EcoRI, and the protruding AATT ends are removed 
with SI nuclease. For this purpose, 1 /xg of plasmid 
is, after EcoRI digestion, incubated with 2 units of 
Si nuclease in 50 mM sodium acetate buffer (pH 4.5), 



containing 200 mM NaCl and 1 mM zinc chloride , at 20 °C 
for 30 minutes* The plasmid is recircularized in a 
known manner via the blunt ends. A hybrid plasmid 
which contains the gene fragment IV is obtained. 

c. Construction of the hybrid plasmid t>UHlO 
which contains the DNA sequence V 

The hybrid plasmid with the gene fragment HMG 
I is opened with EcoRI/Hindlll and ligated with the 
fragment HMG II which is obtained by restriction 
enzyme digestion of the corresponding hybrid plasmid 
with EcoRI/Hindlll. The resulting plasmid is then 
opened with Hindlll/BamHI and ligated with the 
fragment HMG III which can be obtained from the 
corresponding plasmid using Hindlll/BamHI. The 
plasmid obtained in this way is in turn opened with 
BamHI/Xbal and linked to the fragment HMG IV which is 
obtained by digestion of the corresponding plasmid 
with BamHI/Xbal* The hybrid plasmid pUHIO which 
contains the complete HMG gene, corresponding to DNA 
sequence V, is obtained. Figure 2 shows the map of 
pUHIO diagrammatically, with "ori M and "Ap r " indicating 
the orientation in the residual plasmid corresponding 
to pUC18. 

If pINTLI is cut with BstEII and Xbal and the 
large fragment is isolated , and if, on the other hand, 
plasmid pUHIO (figure 2) is digested with the same 
enzymes and the fragment which encompasses most of the 
DNA sequence V from this plasmid is isolated, after 
ligation of the two fragments we obtain a plasmid 
which codes a fusion protein in which arginine follows 
the first eight amino acids in the ballast sequence of 
pINT41d (Table 1), which is followed, starting with 



Leu 3 , by the structural gene of the active domain of 
HMG CoA-reductase. For purposes of comparison, the 
two initial plasmids are cut with enzymes Nco and Xbal 
and the corresponding fragments are ligated together, 
yielding a plasmid which codes, immediately after the 
start codon, the active domain of HMG CoA-reductase 
(in accordance with DNA sequence V of EP-A 0 292 803, 
see table 3 ) . 

—Expression -of the-coded-proteins occurs-according~ 

to Example 4. Following the breakup of the cells, 
centrifugation is performed whereupon the expected 
protein of approximately 55 kDa is determined in the 
supernatant by gel electrophoresis. The band for the 
fusion protein is much more intensive here than for 
the protein expressed directly. Individual portions of 
100 jxl of the supernatant are tested in undiluted 
form, in a dilution of i:io and in a dilution of 1:100 
for the formation of mevalonate. As an additional 
comparison, the fusion protein according to Example 4 
(fusion protein with proinsulin constituent) is 
tested; no activity is apparent in any of the three 
concentrations. The fusion protein with the HMG CoA- 
reductase constituent exhibits maximum activity in all 
three dilutions, while the product of the direct 
expression shows graduated activity governed by the 
concentration. This indicates better expression of 
the fusion protein by a factor of at least 100. 
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Example 16 

Construction of plasmid pB70 

Plasmid pINT41d is split with Mlul and Sail and 
the large fragment is isolated. Plasmid pIK4 shown in 
figure 3a contains a gene for "mini-proinsulin, " the 
C chain of which consists of arginine only. 

The construction of this plasmid has previously 
-been-described~^ 

by reference) . Briefly, the commercial plasmid pUC19 
is opened using the restriction enzymes Kpnl and PstI 
and the large fragment (figure 3-(l)) is separated 
through a 0.8% strength "Seaplaque" gel. This 
fragment is reacted with T4 DNA ligase using the DNA 
(figure 3- (2)) synthesized according to Table 4. 
Table 4 shows the sequence of gene fragment IK I, 
while table 5 represents the sequence of gene fragment 
IK II. 

This ligation mixture then is incubated with 
competent E. coli 79/02 cells. The transformation 
mixture is plated out on IPTG/Xgal plates which 
contain 20 mg/1 of ampicillin. The plasmid DNA is 
isolated from the white colonies and characterized by 
restriction and DNA sequence analysis. The desired 
plasmids are called pIKl (figure 3) . 

Accordingly, the DNA (figure 3- (5)) according to 
Table 5 is ligated into pUC19 which has been opened 
using PstI and Hindlll (figure 3- (4)). The plasmid 
pIK2 (figure 3) is obtained. 

The DNA sequences (2) and (5) of figure 3 
according to Table 4 and 5 are reisolated from the 
plasmids pIKl and pIK2 and ligated with pUC19, which 
has been opened using Kpnl and Hindlll (figure 3- (7)). 
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The plasmid plK3 (figure 3) is thus obtained which 
encodes for a modified human insulin sequence. 

The plasmid pIK3 is opened using Mlul and Spel 
and the large fragment (figure 3a- (9)) is isolated. 
5 This is ligated with the DNA sequence (10) 

B30 Al A2 A3 A4 A5 A6 A7 A8 A9~ 

(Thr) (Arg) Gly lie Val Glu Gin Cys Cye (Thr) (Ser) (10) 

5' CG CGT GGT ATC GTT GAA CAA TGT TGT A 3' 

3 ' A CCA TAG CAA CTT GTT ACA ACA TGA TC 5 • 

10 < MluI > (Spel? 

which supplements the last codon of the B chain (B30) 
by one arginine codon and replaces the excised codon 
for the first 7 amino acids of the A chain and 
supplements the codon for the amino acids 8 and 9 of 
15 this chain. The plasmid plK4 (figure 3a) is thus 

obtained which encodes for human mini-pro insulin. 

In tables 4 and 5, the B- and A-chains of the 
insulin molecule are in each case indicated by the 
first and last amino acid. Next to the coding region 
in gene fragment IK II, there is a cleavage site for 
Sail which will be utilized in the following 
construction. 

Plasmid plK4 is cut with Hpal and Sail and the 
gene coding "mini-proinsulin" is isolated. This gene 
25 is ligated with the above-mentioned large fragment of 

pINT41d and the following synthetic DNA sequence. 

B 1 

(Thr) Arg Met Gly Arg Phe 
30 CG CGT ATG GGC CGT TTC GTT 

A TAC CCG GCA AAG CAA 
(Mlul) (Hpal) 

This gives rise to plasmid pB70, which codes a fusion 
protein in which the ballast sequence (Table l f line 



20 
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1) is followed by amino acid sequence Met-Gly-Arg 
which is followed by the amino acid sequence of the 
"mini -pro insulin" . 
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TABLE 4: Gene fragment IK I (2) 

B 1 

Phe 

10 20 30 40 



< 

CT TTG GAC AAG AGA TTC GTT AAC CAA CAC TTG TGT GGT TCT CAC 
CAT GGA AAC CTG TTC TCT AAG CAA TTG GTT GTG AAC ACA CCA AGA GTG 

< -1 > 

10 (Kpnl) Hpal 

50 60 70 80 90 

— > < 4 

TTG GTG GAA GCG TTG TAC TTG GTT TGT GGT GAG CGT GGT TTC TTC 
15 AAC CAC CTT CGC AAC ATG AAC CAA ACA CCA CTC GCA CCA AAG AAG 

Thr Arg Lys Gly Ser Leu 
100 no 120 



20 



> 



TAC ACT CCA AAG ACG CGT AAG GGT TCT CTG CA 
ATG TGA GGT TTC TGC GCA TTC CCA AGA G 
> 

25 Mlul (PstI) 
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Table 5: Gene fragment IK II (5) 
A 1 

Gin Lys Arg Gly 

130 140 ISO 160 



G AAG CGT GGT ATC GTT GAA CAA TGT TGT ACT AGT ATC TGT TCT 
AC GTC TTC GCA CCA TAG CAA CTT GTT ACA ACA TGA TCA TAG ACA AGA 

-*« (Pstl) — — spel- 



Asn 

170 180 190 200 210 



TTG TAC CAG CTG GAA AAC TAC TGT AAC TGA TAG TCG ACC CAT GGA 

AAC ATG GTC GAC CTT TTG ATG ACA TTG ACT ACT AGC TGG GTA CCT TCG A 

(Hlndlll) 
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Example 17 

By using the oligonucleotides listed below there 
are obtained plasmids pINT90d to pINT96d in analogy to 
the previous examples. An asterisk indicates the same 
5 encoded amino acid in the ballast constituent as in 

pINT41d. 

pINT92 encodes a double mutation in the insulin 
derivative encoded by the plasmid pINT72d since the 
codon for Arg at the end of the ballast constituent 
10 and in the "mini C chain" is substituted by the codon 

for Met. Thus the expressed preproduct can be cleaved 
with cyanogen bromide. 

pINT90d: ******GNSA* (variant of pINT69d) 
TIR : 5 ' - CTG AAATG AGCTGTTGAC— 3 

15 and 

InsuS 0 : 5 ' — TGCCGAATTTCCTGTTGATGTTGTTGC-3 ' 
or 

Insu4 9 : 5 7 -GGAAATTCGGCACGATTTGTGAACCAG-3 ' 
and 

2 0 Insul 1 : 5 ' -TCATGTTTGACAGCTTATCAT-3 ' 



pINT91d: ******GNSA* (variant of pINT72d) 
TIR : 5 ' — CTGAAATGAGCTGTTGAC— 3 ' 
and 

Insu50 : 5 ' -TGCCGAATTTCCTGTTGATGTTGTTGC-3 ' 
25 or 

Insu4 9 : 5 ' -GGAAATTCGGCACGATTTGTGAACCAG-3 ' 
and 

I nsu 11: 5 ' -TCATGTTTG ACAGCTTATCAT- 3 9 
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pINT92d: (double mutant of pINT72d) 

Insu5 6 : 5 ' -TCGACCATGGCAACAACATCAACAATGTTTGTG-3 
and 

Insu58 : 5 ' -GATGCCCATGGTCTT-3 ' 
5 or 

Insu57 : 5 ' -AAGACCATGGGCATC-3 ' 
and 

Insul 1 : 5 9 -TCATGTTTGACAGCTTATCAT— 3 9 



pINT93d: ****** (variant of pINT68d) 
10 Insu53 : 5 ' -ACCATGGCAACAAG&TCAACAAAACGATTTGTG-3 ' 

and 

Insull : 5 ' -TCATGTTTGACAGCTTATCAT-3 9 



pINT94d: ****** (variant of pINT68d) 

Insu54 : 5 ' -ACCATGGCAACAACATCAACACCACGATTTGTG-3 ' 

15 and 

Insull : 5 ' -TCATGTTTGACAGCTTATCAT-3 ' 



pINT95d: ****** (variant of pINT68d) 

Insu55 : 5 9 -TCGACCATGGCAACAACATCAACAATGCGATTTGTG-3 

and 

2 0 Insull : 5 ' -TC ATGTTTGAC AGCTTATCAT— 3 ' 



pINT96d: ****** (variant of pINT68d) 

Insu7 1 : 5 9 -ACCATGGCAACAACATCAACAGGACGATTTGTG-3 9 

and 

Insull : 5 9 — TCATGTTTGACAGCTTATCAT— 3 ' 



nin- ^wn oio3550Ai \ > 
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We claim: 

1. A process for the preparation of fusion 
proteins, which fusion proteins contain a desired 
protein and a ballast constituent, which process 

5 comprises 

(a) constructing a mixed oligonucleotide 
which codes for the said ballast constituent, 

(b) inserting the said mixed oligonucleotide 
into a vector so that it is functionally linked to a 

10 regulatory region and to the structural gene coding 

for the said desired protein, 

(c) transforming host cells with the so- 
obtained vector population and 

(d) selecting from the transf ormants one or 
15 more clones expressing a fusion protein in high yield. 

2. The process as claimed in claim 1, wherein 
the said oligonucleotide codes at its 3' end of the 
coding strand for an amino acid or for a group of 
amino acids which allows an easy cleavage of the said 

20 desired protein from the said ballast constituent. 

3. The process as claimed in claim 2, wherein 
said cleavage is an enzymatic cleavage. 

4. The process as claimed in claim 1, wherein 
the said oligonucleotide is designed so that it leads 

25 to a fusion protein which is soluble or which easily 

can be solubilized. 

5. The process as claimed in claim 1, wherein 
the said oligonucleotide is designed so that the 
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ballast constituent does not interfere with folding of 
the said desired protein. 

6. The process as claimed in claim 1, wherein 
the said oligonucleotide contains the DNA sequence 
(coding strand) 

(DCD) x 

in which D is A, G or T and x is 4 to 12. 

7. The process as claimed in claim 6, wherein x 
is 4 to 8. 



8. The process as claimed in claim 5, wherein 
the said oligonucleotide has the sequence (coding 
strand) 

ATG (DCD) y (NNN) Z 
wherein N stands for identical or different 
nucleotides, excluding stop codons for NNN, z is l to 
4 and y + z is 6 to 12, y being at least 4. 

9. The process as claimed in claim .8, wherein y 
+ z is 6 to 10. 

10. The process as claimed in claim 8, wherein 
20 y is 5 to 8 and z is l. 

11. The process as claimed in claim 6, wherein 
the said oligonucleotide has the sequence (coding 
strand) 

ATG GCW (DCD) 4 . 8 CGW 
25 in which W is A or T. 
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12. The process as claimed in claim 11 , wherein 
the said oligonucleotide has the sequence 

ATG GCA (DCD) W CGW. 

13. The process as claimed in claim 1, wherein 
5 the said desired protein is a proinsulin and wherein 

the said oligonucleotide (coding strand) is 

ATG GCW (DCD) y , ACG CGW or 
ATG GCD (DCD) yf ACG CGT, 

wherein D is A, G or T, W is A or T and y' is 3 to 6. 

10 14 - The process as claimed in claim 13 , wherein 

y' is 4 to 6. 

15. The process as claimed in claim 2, wherein 
the desired protein is a proinsulin. 

16. The process as claimed in claim 15 r wherein 
15 the proinsulin has a C chain which is different from 

that of human proinsulin. 

17. The process as claimed in claim 16 , wherein 
the gene for C chain is designed so that the C chain 
can be split off together with the said ballast 

2 0 constituent. 

18. The process as claimed in claim 17 , wherein 
the C chain consists of arginine. 
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